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Abstract 

This  paper  reports  on  a  simulation  study  of  social  networks  that  investigated  how  network 
topology  relates  to  the  robustness  of  measures  of  system-level  node  centrality.  This  association  is 
important  to  understand  as  data  collected  for  social  network  analysis  is  often  somewhat 
erroneous  and  may — to  an  unknown  degree — misrepresent  the  actual  true  network. 
Consequently  the  values  for  measures  of  centrality  calculated  from  the  collected  network  data 
may  also  vary  somewhat  from  those  of  the  true  network,  possibly  leading  to  incorrect 
suppositions.  To  explore  the  robustness,  i.e.,  sensitivity,  of  network  centrality  measures  in  this 
circumstance,  we  conduct  Monte  Carlo  experiments  whereby  we  generate  an  initial  network, 
perturb  its  copy  with  a  specific  type  of  error,  then  compare  the  centrality  measures  from  two 
instances.  We  consider  the  initial  network  to  represent  a  true  network,  while  the  perturbed 
represents  the  observed  network.  We  apply  a  six-factor  full-factorial  block  design  for  the  overall 
methodology.  We  vary  several  control  variables  (network  topology,  size  and  density,  as  well  as 
error  type,  form  and  level)  to  generate  10,000  samples  each  from  both  the  set  of  all  possible 
networks  and  possible  errors  within  the  parameter  space.  Results  show  that  the  topology  of  the 
true  network  can  dramatically  affect  the  robustness  profile  of  the  centrality  measures.  We  found 
that  across  all  permutations  that  cellular  networks  had  a  nearly  identical  profile  to  that  of 
uniform-random  networks,  while  the  core-periphery  networks  had  a  considerably  different 
profile.  The  centrality  measures  for  the  core-periphery  networks  are  highly  sensitive  to  small 
levels  of  error,  relative  to  uniform  and  cellular  topologies.  Except  in  the  case  of  adding  edges,  as 
the  error  increases,  the  robustness  level  for  the  3  topologies  deteriorate  and  ultimately  converges. 
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1.  Motivation 

By  its  nature,  social  network  analysis  is  burdened  by  the  underlying  complexity  of  both  the 
underlying  subject  matter  and  its  data  collection  procedures.  While  analysts  have  made 
substantial  progress  in  developing  techniques  to  analyze  the  data  that  they  have  collected, 
measurement  error  in  social  network  data  remains  an  omnipresent  problem  (Marsden,  1990).  In 
addition  to  other  intrinsic  complications,  social  network  measurement  error  arises  from  the 
inherent  ambiguity  of  human-infonnant  reliability — unintentional  (Freeman,  Rornmey,  Kimball, 
&  Freeman,  1987;  Killworth  &  Bernard,  1976)  and  intentional  (Carley,  2003) — ,  and  the 
intricacy  of  collection-instrument  design. 

In  spite  of  this  widely  recognized  problem,  analysts  continue  to  infer  a  great  deal  from  the 
error-prone  network  data.  Of  course,  analysts  derive  important  quantitative  measures  of  the 
network  using  the  data  that  they  have.  Unfortunately,  the  accuracy  of  their  analysis  may  in  fact 
be  bounded,  or  at  least  restricted,  by  the  accuracy  of  the  underlying  data  they  prudently  rely 
upon.  Subsequently,  analysts  and  particularly  consumers  of  infonnation  may  in  fact  be  making 
misguided  judgments  based  on  mistaken  analysis  derived  from  flawed  source  data.  Given  the 
subsequent  presumed-error  in  the  network  measures,  one  can  only  contemplate  how  misguided 
past  analyses  may  in  fact  be  and  what  the  impact  on  subsequent  actions  may  have  been. 

The  research  herein  is  broadly  motivated  by  the  wide  recognition  that  measurement  error  is 
truly  ubiquitous  in  social  network  data;  yet  we  have  little  understanding  of  the  actual  impact  on 
the  critical  quantitative  measures  we  rely  upon  for  our  analysis.  Irrespective  of  the  past  calls  for 
the  study  of  the  impact  of  this  problem  (Marsden,  1990),  even  vague  attempts  to  address  this 
quandary  are  rare.  While  an  unsuccessful  literature  search  may  brand  the  matter  as  terra 
incognita,  there  have  indeed  been  a  mere  handful  of  pertinent  articles  published  on  the  subject. 
Certainly,  further  exploration  is  warranted. 

Our  research  aims  to  increase  the  knowledge  of  the  impact  of  erroneous  source-data  on 
network  measures.  Specifically,  we  seek  to  understand  the  robustness  of  network  measures  of 
centrality  relative  to  the  network’s  topology.  We  project  that,  given  the  known  characteristics  of 
a  social  network — including  a  priori  estimate  of  error  characteristics — ,  analysts  may  ultimately 
be  able  to  quantify  the  impact  of  these  errors  on  centrality  measures  and  adjust  their  analyses 
accordingly.  Ultimately  analysts  may  some  day  harvest  more  accurate  information  obtained  from 
the  likely  true  network  rather  than  from  the  erroneous  observed  network 

2.  Introduction 

The  term  robustness  as  it  pertains  to  social  networks  has  two  related,  albeit  different 
connotations.  The  robustness  of  a  network,  is  concerned  with  the  reliability  (Kim  &  Medard, 
2004)  and  continued  functioning  of  a  network  following  an  intervention.  Post-9/11,  this  is 
particularly  in  the  context  of  a  destructive  attack — purposeful  (Tsvetovat  &  Carley,  2005)  or 
accidental — on  the  nodes  or  connections  in  a  network.  The  robustness  of  a  network  is 
particularly  relevant  in  communication-type  and  flow-oriented  networks.  The  purpose  for 
understanding  robustness  of  a  network  has  more  of  a  management  of  the  network  connotation. 

Another  connotation  of  the  term  robustness — the  one  in  which  we  are  primarily  concerned 
herein — is  the  robustness  of  the  measures  of  a  network.  When  associating  the  tenn  to  a  measure, 
the  meaning  has  more  of  a  statistical  connotation.  Studying  the  robustness  of  a  measure  of  a 
network  can  also  be  referred  to  as  conducting  a  sensitivity  analysis  on  the  measure.  In  keeping 
with  the  tenninology  of  the  most-recently  published  research  in  this  area,  in  lieu  of  using  the 
term  sensitivity,  we  too  will  use  the  robustness  term,  although  the  terms  can  be  used 
interchangeably. 
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A  measure  is  robust  if  a  slight  perturbation  in  its  input  produces  a  slight  change  in  its  output. 
Robustness  is  clearly  desirable  in  a  measure,  as  input  data  is  seldom  without  error,  and 
robustness  implies  that  the  measure’s  output  for  the  true  data  (the  input  data  without  error)  is 
nearly  equal  to  that  of  the  data  with  error. 

Further  to  investigating  the  robustness  of  a  measure,  our  attention  is  drawn  to  the  robustness 
of  measures  of  centrality  of  a  network  because  the  notion  of  centrality  is  one  of  the  first 
(Moreno,  1934)  and  foremost  measures  social  analysts  concern  themselves  with.  A  network- 
actor’s  centrality  is  often  associated  with  the  level  of  their  prestige  and  power  relative  to  other 
actors,  which  is  a  key  question  of  even  the  most  basic  actor-level  social  network  analysis.  At  the 
overall  network  level  of  analysis,  or  sub-group  level,  identifying  the  group  of  actors  with  the 
highest  values  in  the  various  measures  of  centrality  is  also  a  common  activity. 

There  are  numerous  perspectives  on  centrality  in  a  network  as  evidenced  by  the  multitude  of 
measures  fonnulated  and  substantiated  in  literature.  Freeman’s  (1979)  seminal  essay  in  the  first 
volume  of  Social  Networks  seemingly  introduced  the  core  concept  to  the  social  network 
community.  Particular  to  this  study,  we  scrutinize  four  specific  measures  of  centrality  chosen 
because  of  their  prominence  in  network  analysis.  We  focus  on:  degree,  closeness,  betweenness, 
and  eigenvector  centralities.  (These  four  specific  measures  are  also  core  to  a  prior  robustness 
study  whose  methodology  we  follow  closely — see  our  Methodology  Section  for  more 
information.). 

Consistent  with  the  foundational  importance  of  centrality  to  social  network  research;  to  date, 
past  studies  of  robustness — again,  as  we  regard  the  term — have  focused  exclusively  on  measures 
of  centrality  as  opposed  to  any  other  group  of  or  individual  measures.  Although  the  total  number 
of  robustness-specific  studies  is  somewhat  limited,  there  are  both  empirically-based  and 
simulation-based  studies  to  consider. 

One  approach  an  analyst  may  take  to  address  inherent  measurement  error  is  to  use 
comprehensive  statistics  (Frank,  1971)  or  one  of  several  sampling  techniques  (Erickson  & 
Nosanchuk,  1983;  Frank,  1978;  Frank,  1981;  Galaskiewicz,  1991;  Granovetter,  1976)  on  the 
observed  network  to  make  “reasonable,  if  not  excellent”  (Galaskiewicz,  1991,  p.  347)  estimates 
of  the  actual  centrality  measures  for  the  true  network. 

In  a  recent,  combined  meta-analysis  and  simulation  study  utilizing  empirical  social  network 
data  obtained  from  8  independently  conducted  studies,  Costenbader  and  Valente  (2003)  analyzed 
1 1  measures  of  centrality  for  their  robustness  to  simulated  network  data  error2.  They  concluded 
that  under  “some  circumstances”  (p.  305)  analysts  may  still  use  measures  calculated  from  data 
that  has  missing  information.  They  continued  however,  “The  results  of  this  study  should  be 
interpreted  with  caution...”  (P.  305)  and  warned  of  limitation  to  the  generalization  of  their 
findings. 

Another  combined  case  study — specifically,  a  16,726  node  collaboration  network — and 
simulation  by  Kossinets  (2005)  also  investigated  data  error  by  applying  random  error  to 
empirical  data.  Kossinets  found  that  errors  such  as  those  resulting  from  boundary  specification 
(including  only  a  subset  of  relevant  nodes)  can  significantly  alter  network-level  statistics  such  as 
average  degree  centrality,  clustering,  and  other  measures. 


2  Actually,  Costenbader  and  Valente  (2003)  drew  repeated  random  samples  (subsets)  from  the  existing  network  data 
in  order  to  investigate  sampling  techniques  (Rothenberg,  1995).  We  posit  that  their  approach  is  congruent  with 
investigating  data  set  error. 
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In  this  study  we  seek  to  develop  more  generalized  findings  that  specific  case-based  studies 
allow.  To  accomplish  this,  we  can  take  advantage  of  unlimited  computing  power  to  randomly 
sample  both  the  true  network  and  the  observed  network  as  opposed  to  prior  studies  using  an 
empirical  data  set  as  the  true  network  and  sampling  only  the  observed  from  that. 

The  Borgatti,  Carley,  and  Krackhardt  (in  press) — which  is  a  template  for  the  methodology  of 
this  study — explored  robustness  of  four  centrality  measures  (degree,  betweenness,  closeness  and 
eigenvector)  in  random  graphs  and  had  several  findings  including:  (a)  Measure  accuracy  declines 
predictably  with  increasing  error,  (b)  the  measures  have  a  similar  robustness  pattern  and  level, 
(c)  the  type  of  error  (node  /  edge)  has  little  affect  on  the  robustness  level,  and  (d)  increasing 
density  reduces  accuracy  of  the  measures,  except  for  edge-addition  where  accuracy  increased. 

As  is  often  the  case,  initial  studies  on  networks  are  conducted  on  the  uniform  random 
network  model  attributed  to  Erdos  and  Renyi  (1959).  Following  tradition,  Borgatti,  Carley  and 
Krackhardt  conducted  their  research  using  ER  graphs.  We  extend  on  their  work  by  exploring  the 
relevance  of  a  network  topology  is  on  the  findings  of  Borgatti,  Carley  and  Krackhardt.  Since 
social  networks  are  rarely  of  the  uniform  random  network  variety,  we  investigate  the  robustness 
of  measures  of  centrality  under  different  network  topologies.  The  three  topologies  we  study  are: 
uniform,  cellular,  and  core-periphery  (Borgatti  &  Everett,  1999). 

Cellular  networks  are  characterized  by  consiting  of  a  collection  of  distributed,  but  sparsely 
connected,  tightly-coupled  cells,  a.k.a.  groups,  that  are  often  small  and  (if  functional)  operate 
independent  of  one  another  and  can  be  somewhat  self-similar  in  their  fonn.  Core-periphery 
networks  are  those  that  have  a  single  primary  cohesive  core  group  that  is  sparsely  tied  to  a 
periphery  of  others  that  often  are  not  ties  to  others  beyond  those  in  the  core. 

3.  Method 

This  study  borrows  its  methodology  directly  from  the  experiments  recently  conducted  by 
Borgatti,  Carley  and  Krackhardt  (in  press).  To  evaluate  the  robustness  of  four  measures  of 
centrality,  they  conducted  a  multitude  of  experimental  trials  using  simulated  relational  data  in  the 
form  of  uniformly-random  networks,  i.e.,  Erdos  and  Renyi  (1959)  uniform  graphs.  For  each 
replication,  they  generated  a  true  network  that  was  effectively  randomly  drawn  from  the 
complete  ensemble  of  possible  networks  based  on  several  control  parameters.  This  true  network 
was  then  systematically  perturbed  (effectively  drawing  randomly  from  the  realm  of  all 
possibilities),  resulting  in  a  corresponding  observed  network,  i.e.,  the  true  network  with 
simulated  measurement  errors.  The  value  differences  between  the  corresponding  centrality 
measures  for  the  network-pair  where  then  evaluated.  The  experiment  was  controlled  under  a 
factorial  design  in  which  they  varied  several  parameters  that  characterized  the  generation  of  the 
true  and  observed  network-pairs. 

Herein,  we  duplicate  the  prior  design  and  expand  on  the  analysis  by  introducing  network 
topology  as  an  additional  control  parameter,  i.e.,  independent  variable.  In  essence,  we  simply 
add  an  additional  dimension  to  the  factorial  design  which  allows  us  to  investigate  the  relationship 
between  network  topology  the  robustness  of  the  centrality  measures.  We  control  for  network 
topology  by  systematically  varying  the  generation  of  the  true  network  across  these  three  network 
forms:  uniform,  cellular,  and  core-periphery. 

Note:  Readers  familiar  with  the  design  of  the  Borgatti,  Carley  and  Krackhardt  (in  press) 
experiment  may  choose  to  pass  over,  or  merely  skim,  the  remainder  of  this  section  as  we  provide 
similar  methodological  information;  although  herein,  somewhat  different  terminology  is 
employed  and  different  aspects  of  the  design  are  accentuated. 
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3.1  Six-Factor  Full-Factorial  Block  Design 

The  overall  approach  incorporates  a  six-factor  (8x5x4x3x2x2)  fu]]_factorja]  block 
design  totaling  1,920  independent  trials.  The  factors  we  control  for,  i.e.,  control  variables,  and 
the  respective  number  of  values  (in  parentheses)  for  each  are:  network  topology  (3),  network  size 
(4),  network  density  (8),  error  type  (2),  error  form  (2),  and  error  level  (5).  Given  a  factorial 
design,  a  particular  trial  is  characterized  by  a  combination  of  the  six  control  variables,  each 
systematically  assigned  one  of  their  respective  possible  values.  For  example,  one  specific  trial  is 
described  as  “topology=cellular,  size=100,  density=30%,  error  type=node,  error  form=remove, 
and  error  level  =  10%.” 

Emanating  from  this  factorial  design,  at  the  end,  a  six  dimension  results  matrix  is  ultimately 
assembled.  Each  dimension  corresponds  to  one  of  the  six  control  variables.  Each  element  of  the 
results  matrix  contains  a  group  of  summary  statistics,  e.g.,  mean-average,  that  is  calculated  using 
data  values  from  the  set  of  replications  underlying  that  trial.  It  is  this  completed  results  matrix 
that  is  the  principal  focus  of  the  results  analysis  we  present  later  in  the  report. 

Each  trial  is  replicated  10,000  times  under  identical  conditions,  i.e.,  using  the  same  assigned 
values  for  the  six  control  variables.  Each  replication  is  an  experimental  unit  conducted  entirely 
independent  of  any  other.  The  outcome  of  each  replication  is  a  set  of  values  for  basic  centrality 
measures  which  ultimately  contributes  to  the  trial’s  robustness  summary  statistics  for  which  it  is 
a  part. 

3.2  Control  Variables 

For  the  purpose  of  our  discussion,  we  segregate  each  of  the  6  independent  control  variables 
into  one  of  two  classes;  either  the:  (a)  Network  Class,  or  (b)  Error  Class.  Each  variable  is 
classified  according  to  its  connection  to  either  the  construction  of  the  initial  true  network,  or  to 
creating  the  perturbations  leading  to  the  formation  of  the  observed  network. 


Table  1.  Independent  variables  and  assigned  values  differentiating  trials 


Independent 

Variable 

Number 

of 

Values 

Assigned  Values 

Network  Class 

Topology 

3 

uniform,  cellular,  core-periphery 

Size3 

4 

10,  25,50,  100 

Density*3 

8 

1%,  2%,  5%,  10%,  30%,  50%,  70%,  90% 

Error  Class 

Type 

2 

node,  edge 

Form 

2 

add,  remove 

Level0 

5 

1%,  5%,  10%,  25%,  50% 

a  specifies  the  number  of  nodes  in  the  true  network 

b  specifies  the  number  of  edges  in  the  true  network  relative  to  its  number  of  nodes 
c  specifies  the  number  of  edges  or  nodes — added  or  removed — relative  to  the  original 
true  network 


CMU  SCS  ISRI 


-8- 


CASOS  Report 


The  control  variables  in  the  Network  Class  are:  topology,  size,  and  density.  Variables  in  this 
class  represent  a  defining  characteristic  of  the  true,  a.k.a.,  orignal,  network.  Possible  values  for 
the  network-topology — referring  to  the  network-level  structure — are:  unifonn,  cellular,  and  core¬ 
periphery.  Possible  values  for  network-size — referring  to  the  number  of  nodes  making  up  the 
network — are:  10,  25,  50,  and  100.  Possible  values  for  the  network-density — referring  to  the 
edge  density  of  the  overall  network — are:  1%,  2%,  5%,  10%,  30%,  50%,  70%,  and  90%. 

The  control  variables  of  the  Error  Class  are:  type,  form  and  level.  Each  variable  in  this  class 
refers  to  the  manner  in  which  the  true  network  is  systematically  perturbed.  Recall,  the  original 
true  network  is  randomly  changed  (within  systematically  specified  parameters),  thus  creating  the 
new  observed  network.  Possible  values  for  the  error  type  are:  node,  and  edge.  Possible  values  for 
the  error  form — referring  to  the  addition  or  removal  of  the  error-types,  i.e.,  nodes  or  edges — are: 
add,  and  remove.  Possible  values  for  error  size — referring  to  the  percentage  of  nodes  or  edges, 
being  added  or  removed — are:  1%,  5%,  10%,  25%,  and  50%. 

3.3  Network-Pairs 

Each  replication  within  experimental  trials  involves  a  single,  unique  network-pair.  Each 
network-pair  consists  of  one  true  network  and  one  observed  network.  The  true  network  pertains 
to  the  “truth”  of  the  network  under  study,  while  the  observed  network  pertains  to  data  in  reality 
collected  for  that  same  true  network.  Using  this  network-pair  paradigm  provides  an  opportunity 
to  identify  and  inspect  the  precise  differences  between  the  true  and  the  observed  at  the  detail 
level  (we  can  identify  specific  nodes  and  edges  that  are  changed,  or  in  error)  and  according  to 
any  measures  from  either  case. 

We  give  rise  to  the  true  network  by  randomly  drawing  a  single  unique  network  from  the  set 
of  all  possible  realizations  that  can  be  constructed  from  the  specific  characteristics  designated  by 
the  control  parameters  (topology,  number  of  nodes,  density,  etc)  for  the  trial.  This  network  is 
labeled  the  true  network  for  this  network-pair.  An  exact  copy  of  this  true  network  is  then 
perturbed  according  to  other  control  variables  for  the  experiment  (error  type,  form  and  level). 
This  true-but-now-changed  network  is  labeled  the  observed  network.  More  detail  to  the  creation 
of  these  networks  appears  later  in  this  section. 

3.4  Measures  of  Centrality 

We  evaluate  network  centrality  as  a  generalized  concept  by  assessing  four  specific  node-level 
centrality  measures  which  are  instrumental  to  most  social  network  analysis.  For  each  node  in  a 
given  network  (true  or  observed),  we  gauge:  degree,  betweenness,  closeness,  and  eigenvector 
centralities. 

To  calculate  the  four  values  for  each  node,  we  make  use  of  ORA  (Carley  &  Reminga,  2004), 
which  is  network-statistics  software  that  is  established  in  the  network  analysis  field.  After  the 
various  values  have  been  calculated  for  each  of  the  nodes;  for  each  of  the  four  measures,  ORA 
provides  a  ranked  list  of  nodes,  ordered  from  the  node  with  the  highest  value  to  the  lowest. 

This  process  results  in  four  separate  ordered  lists  of  nodes  for  each  network  in  the  network- 
pair.  We  refer  to  the  ordered-lists  formed  from  the  true  network  as  the  true  centrality  list  and 
those  from  the  observed  network  we  identify  as  the  observed  centrality  list.  Using  the 
respectively  paired  true  and  observed  centrality  lists,  we  can  calculate  the  congruence  of  the  four 
centrality  measures  across  the  network-pairs,  i.e.,  the  corresponding  true  and  the  observed 
networks. 
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3.5  Calculating  Congruence 

From  respective  true  and  observed  centrality  lists  for  each  network-pair,  we  calculate  the 
congruence  in  five  ways:  (a)  Topi,  (b)  Top3,  (c)  ToplO%,  (d)  Overlap,  and  (e)  R-Squared. 
These  each  provide  an  indication  of  the  congruence  between  the  true  and  the  observed  networks 
in  terms  of  their  four  centrality  measures.  If  both  of  the  centrality  lists  making  up  a  centrality- 
pair  are  identical  in  all  aspects,  they  would  be  perfectly  congruent.  Three  of  the  congruence 
measures  are  binary  values  while  two  are  real  values  from  0  to  1,  inclusive. 

•  Topi  is  a  binomial  value  that  reflects  whether  (=1)  or  not  (=0)  the  top-rank  node  in  the 
true  network  is  also  the  top-ranked  node  in  the  observed. 

•  Top 3  is  a  binomial  value  that  reflects  whether  (=1)  or  not  (=0)  the  top-rank  node  in  the 
true  network  is  one  of  the  top-3  nodes  in  the  observed  network. 

•  Top  10%  is  a  binomial  value  that  reflects  whether  (=1)  or  not  (=0)  the  top-rank  node  in 
the  true  network  is  also  ranked  in  the  top  10%  (of  nodes)  in  the  observed  network. 

•  Overlap  is  a  real  value  between  0  and  1  (inclusive)  that  reflects  the  extent  to  which  the 
top  10  nodes  in  the  true  network  match  the  top  10  nodes  in  the  observed  network.  The 
formula  for  this  ratio  is:  N( Tt  fl  T0)  /  N( Tt  U  T0). 

•  R-Squared  is  the  squared  value  of  the  Pearson  correlation  between  the  true  and  the 
observed  centrally  measures.  For  the  R-Squared  value,  nodes  not  found  in  both  the  true 
and  the  observed  networks  are  excluded  from  this  statistic. 

For  each  trial  (each  consisting  of  10,000  replications)  basic  summarization  statistics  of  the 
congruence  measures  are  recorded,  including:  minimum  value,  maximum  value,  average  value 
(the  arithmetic  mean),  and  standard  deviation.  These  summary  statistics  are  determined  for  each 
of  the  measures  of  robustness  and  represent  a  quantitative  perspective  on  the  congruence  of  the 
centrality  measures  for  a  given  experimental  trial. 

3.6  Determining  Robustness 

A  measure  is  robust  if  a  small  change  in  its  input  value(s)  produces  only  a  slight  change 
in  its  output  value.  In  this  study  we  ponder  the  robustness  of  the  measures  of  network  centrality; 
that  is,  given  different  levels  of  error  in  the  input  data,  how  much  difference  is  there  in  the  values 
for  the  centrality  measures. 

To  detennine  and  quantify  robustness,  we  investigate  the  summarized  congruence  values  vis- 
a-vis  the  combinations  of  values  for  the  control  variables,  Network  class  or  Error  class.  We  form 
tables  and  corresponding  line  graphs  to  facilitate  analysis,  from  which  we  draw  our  findings  and 
conclusions. 

3.7  Generating  the  True  Network 

We  apply  the  Network  Class  control  variables  to  specify  the  true  network  for  each 
replication,  which  is  statistically  independent  of  one  another.  This  involves  assigning  one  value 
— based  on  the  specific  trial  for  which  the  replication  is  affiliated  with — to  each  of  the  three 
characteristics:  network  topology,  network  size,  and  network  density;  to  characteristic  the 
specific  population  of  networks  from  which  the  true  network  is  essentially  randomly  drawn. 

In  practice,  we  actually  generate  the  drawn  network  by  using  software  written  specifically  for 
this  study  that  follows  the  algorithm(s)  described  in  Frantz,  Airoldi,  and  Reminga  (2005).  Using 
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the  topology-specific  generation  procedure,  we  ultimately  embody  the  true  network  in  a 
DyNetML  (Tsvetovat,  Reminga,  &  Carley,  2004)  formatted  data  file  (optionally  held  in  CPU 
memory,  or  persisted  via  a  disk  file). 

The  network  size  control  variable  is  assigned  a  single  value  from  this  list:  10,  25,  50,  and 
100.  This  refers  to  the  exact  number  of  nodes  making  up  the  true  network.  We  limit  ourselves 
to  these  smaller-sized  networks  as  they  correspond  in  order  of  size  with  many  social  network 
studies  whose  data  is  collected  through  survey  or  some  subject  response  vehicle,  as  opposed  to  a 
study  that  can  collect  data  in  an  automated  form. 

The  network  density  control  variable  is  assigned  a  single  value  from  this  list:  1%,  2%,  5%, 
10%,  30%,  50%,  70%,  and  90%.  This  value  is  used  in  a  formula  to  determine  the  number  of 
edges  in  the  network. 

Following  the  network  topology  control  variable,  we  characterize  the  topology  for  each  true 
network  as  one  of  three  forms:  unifonn,  cellular,  or  core -periphery,  “uniform”  is  an  Erdos  and 
Renyi  (1959)  graph  (also  called  a  “Bernoulli”  graph,  among  other  names)  that  begins  with  a 
fixed  number  of  nodes  (herein  based  on  the  network  size  control  variable)  and  systematically 
evaluates  each  possible  node  pair  to  determine  if  that  specific  potential  tie  between  the  two  nodes 
will  actually  exist  or  not — based  on  a  fixed  probability.  In  our  experiments,  this  probability  is 
fixed  to  a  value  according  to  a  fonnula  based  on  the  network  density  control  variable. 

The  “cellular”  network  is  further  characterized  by  a  parameter  provided  to  the  software, 
called  “ mean  cell  size ”  that  we  fixed  with  a  value  of  6.  This  is  the  average  number  of  cells 
making  up  a  single  cell  in  the  network,  which  is  tied  together  as  a  complete  clique. 

The  “core  periphery”  network  is  further  characterized  by  a  parameter  provided  to  the 
software  called  “ alpha ”  that  we  set  with  a  value  of  6.  The  number  of  edges  a  node  has  is 
proportional  to  its  attribute  vector  value.  In  a  network  with  N  total  nodes,  node  Us  attribute 
vector  value  =  (10*A /  ( k+l)Aalpha ). 

3.8  Generating  the  Observed  Network 

To  generate  data  for  the  observed  network  of  the  network-pair,  the  true  network  is  purposely 
perturbed.  Depending  on  the  Error  Class  parameters  designated,  we  add  (or  remove)  nodes  and 
edges  from  the  true  network  dataset  to  create  a  new,  observed,  dataset.  Within  the  parameters, 
these  perturbations  are  random  following  a  well  defined  heuristic,  described  below. 

In  the  case  of  adding  nodes,  new  nodes  are  added  according  to  the  number  specified  by  the 
error-level  parameter  (values:  1%,  5%,  10%,  25%,  and  50%).  To  determine  the  number  of  new 
nodes  the  error-level  parameter  is  multiplied  by  the  value  of  the  network-size  parameter  (values: 
10,  25,  50,  and  100).  For  example,  in  the  case  of  a  true  network  with  100  nodes,  to  be  perturbed 
by  adding  nodes  to  the  level  of  5%,  we  would  add  5  nodes  to  the  true  network  to  generate  the 
observed  network.  Further,  edges  are  also  added  accordingly  to  the  new  nodes  to  prevent  adding 
simply  a  set  of  isolate  nodes.  To  add  these  edges,  for  each  new  node,  another  node  is  randomly 
selected  from  the  existing  nodes  and  a  corresponding  number  of  edges  are  added  to  the  new  node 
to  match  the  number  of  the  sampled  node.  The  alter  edges  are  ties  to  randomly  selected  other 
nodes  that  do  not  already  have  an  edge  to  the  new  node. 

In  the  case  of  adding  edges,  new  edges  are  added  according  to  the  number  specified  by  the 
error-level  parameter  (values:  1%,  5%,  10%,  25%,  and  50%).  To  determine  the  number  of  new 
edges,  the  error-level  parameter  is  multiplied  by  the  number  of  edges  in  the  true  network  (this 
determined  by  the  network  size  multiplied  by  the  density).  As  in  the  case  of  the  edge  addition 
step  when  adding  nodes,  already  existing  edges  are  not  strengthened. 
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As  with  the  true  network,  the  observed  network  is  represented  in  DyNetML  (Tsvetovat, 
Reminga,  &  Carley,  2004)  and  stored  in  computer  memory  and  persisted  on  computer  disk. 

3.9  Procedure 

While  the  true  networks  are  considered  to  be  drawn  uniformly  at  random  from  all 
possibilities  given  the  Network  Class  variables,  in  practice  the  true  networks  are  actually 
generated  according  parameters  consisting  of  the  Network  Class  variables  and  computer  code 
applying  algorithms  according  to  the  desired  network  topology.  The  result  is  the  same  regardless 
of  the  actual  heuristic  used,  i.e.,  a  network  with  the  expected  values  of  the  Network  Class 
variables  is  presented  randomly  from  all  the  possibilities  constrained  by  values  of  the  Network 
Class  variables. 

•  Repeat  for  each  value  (n=3)  of  network  topology 

o  Repeat  for  each  value  (n= 4)  of  network  size 

■  Repeat  for  each  value  (n=5)  of  network  density 
•  Repeat  for  each  value  (n= 2)  of  error  type 

o  Repeat  for  each  value  (n= 2)  of  error  form 

■  Repeat  for  each  value  (n= 6)  of  error  level 

•  Replicate  10,000  times  for  each  trial 

o  Generate  true  network  based  on 
Network  Class  control  variables 
o  Perturb  the  true  network  to  create  an 
instance  of  an  observed  network 
based  on  Error  Class  control 
variables,  creating  a  network-pair 
o  Compute  centrality  rankings  lists  for 
both  the  true  and  the  observed 
networks 

o  Calculate  pair-wise  congruence 
values  for  each  of  the  four  centrality 
ranking  lists 

•  Calculate  summary  statistics  for  the 
congruence  values  for  each  of  the  four 
centrality  measures  (n=l  0,000):  minimum, 
maximum,  mean,  standard  deviation. 

■ 

o 


i  o 

•  Report  summary  statistics  across  all  experiment  blocks  (n= 1,440) 

Fig.  1.  Pseudo  code  summarizing  the  factorial  block  procedures  of  the  methodology 
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4.  Results 

We  report  five  specific  observations  stemming  from  our  review  of  the  entire  set  of  results 
data,  although  we  present  only  from  the  perspective  of  the  case  of  the  100  nodes  with  50% 
density  (100/50%)  true  networks.  While  in  this  section  we  present  only  selected  charts 
chosen  specifically  to  augment  the  five  observations,  the  entire  set  of  result-data  tables  for 
the  100/50%  true  network  experiments  are  provided  in  the  Appendix  and  the  data  for  all 
experiments  are  available  from  the  authors. 

We  present  the  detail  of  our  observations  in  the  case  of  a  single  true  network  to  keep  the 
text  undemanding.  We  consider  this  approach  suitable  since  the  100/50%  case  provides  a 
neutral  configuration  for  comparing  the  sensitivity  of  the  centrality  measures;  Borgatti, 
Carley  and  Krackhardt  (in  press)  establish  that  when  evaluating  robustness  50%  density  is  an 
impartial  point  relative  to  the  type  of  errors  (node/edge  add/remove). 

Our  first  observation  is  the  prominent  similarity  of  the  accuracy  scores  across  the  four 
centrality  measures,  regardless  of  the  experiment  parameters  (true  network  and  errors  control 
variables).  Throughout  the  experiment  under  the  same  true/observed  network  parameters, 
degree  centrality,  betweenness,  closeness,  and  eigenvector  centrality,  all  showed  remarkably 
similar  values  for  measure  accuracy  (with  one  exception  to  be  pointed  out  later).  In  every 
instance  of  the  combination  of  topology,  size,  density,  error-type  and  error-form,  robustness 
profile  across  the  measures  is  comparable.  As  expected,  the  actual  accuracy  values  for  a 
congruence  measure  (topi,  top3,  toplOpct,  etc.)  differed,  but,  as  we  observed,  their 
corresponding  values  were  consistent  across  each  of  the  four  centrality  measures. 


DATA  100  50%  :<  edge  remove  MAXIS)  degCent  top  DATA  100  50%:  <  edge  remove  MAXIS)  betCent  top 


(a)  Degree  Centrality 


(b)  Betweenness  Centrality 


DATA  100  50%: (edge  remove  MAXIS)  clsCent  top 


DATA  100  50%: (edge  remove  MAXIS)  eigCent  top 


— 

1  1  1  1 

corePeriphery  — H - 

uniform  — X — 

cellular  *  •  -3K-  - 
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error Level 


(c)  Closeness  Centrality  (d)  Eigenvector  Centrality 

Fig.  2.  Cross-cut  of  four  centrality  measures  showing  comparable  accuracy  profiles. 
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As  an  example,  Fig.  2  shows  this  phenomenon  for  the  100/50%  network  with  edge- 
remove  errors  for  the  top  congruency  measure.  Each  line  on  the  chart  represents  the  accuracy 
between  the  true  and  the  observed  network  for  a  given  topology  at  different  error  levels.  The 
similarity  of  the  four  charts  can  be  readily  seen,  which  provides  a  simple  visual  to  this 
phenomenon.  It  is  easy  to  also  recognize  in  Fig.  2  that  the  uniform  and  cellular  lines  lie  atop 
one  another;  this  will  be  discussed  separately  later. 

To  further  keep  the  text  undemanding,  we  will  also  limit  our  presentation  of  results  from 
this  point  forward  to  those  of  the  degree  centrality  only.  Degree  centrality  is  certainly  the 
least  complicated  of  the  measures  to  think  about;  as  provided  by  the  first  observation,  all 
observations  about  degree  centrality  can  be  equally  ascribed  to  betweenness,  closeness,  and 
eigenvector  centrality  measures  (again,  with  one  exception  to  be  pointers  out  later).  Readers 
should  consider  degree  centrality  as  a  proxy  for  the  other  centrality  measures. 

A  second  observation,  as  noted  briefly  above,  is  that  the  uniform  and  cellular  topologies 
have  nearly  identical  accuracy  results  from  every  view  of  the  data.  When  looking  at  any  plot 
of  the  results,  the  graphic  representations  of  the  two  most  often  lie  atop  one  another.  In  many 
cases  the  average  accuracy  values  are  identical,  with  numerous  exceptions,  but  the 
differences  are  usually  limited  to  only  a  tenth  of  a  percentage  point.  Figure  2,  used  to  support 
the  first  observation,  also  provides  an  example  of  the  duplicity  of  the  results  for  the  uniform 
and  cellular  topologies.  In  some  instances  the  two  do  display  as  separated  lines,  albeit  very 
slightly.  For  this  reason,  we  will  continue  to  show  the  three  topologies  on  graphs,  but  will 
generally  discuss  unifonn  and  cellular  as  a  united  pair. 


DATA  100  50%: (node  remove  MAXIS)  degCent  topi 


DATA  100  50%: (node  add  MAXIS)  degCent  topi 


(a)  Node  Remove 


error Level 

(b)  Node  Add 


DATA  100  50%: (edge  remove  MAXIS)  degCent  topi  DATA  100  50%: (edge  add  MAXIS)  degCent  topi 


(c)  Edge  Remove  (d)  Edge  Add 

Fig.  3.  Degree  centrality  showing  average  accuracy  of  top3  congruency  measure 
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Our  third  observation  is  that,  in  two  of  the  four  error  type/forms,  the  core  periphery  topology 
consistently  shows  significantly  more  accuracy  than  that  of  the  unifonn  and  cellular  topologies. 
In  the  specific  cases  of  node-remove  and  edge-add  errors,  core  periphery  has  much  more 
accuracy  that  the  other  topologies  when  any  error  is  introduced.  In  these  cases  core  periphery 
seems  to  have  highly  robust  measures  of  centrality,  seemingly  immune  to  network  data  errors. 
Conversely,  in  the  contrary  cases  of  node-remove  and  edge-add,  core  periphery  has  markedly 
less  accuracy  as  soon  as  any  error  is  introduced. 

Figure  3  shows  an  example  of  the  distinctive  accuracy  profiles  of  core  periphery  and 
uniform/cellular  topologies  for  the  degree  centrality  /  top3  congruence  measure.  Removing  nodes 
has  little  impact  on  a  core  periphery  network  and  is  hardly  noticeable  until  levels  of  50%  error. 
Similarly,  adding  edges  has  little  effect,  if  any,  on  the  core  periphery  networks.  However, 
removing  edges  has  dramatic  affect  on  the  degree  /  top  3  measure  immediately  upon  any 
introduction  of  error  as  does  the  adding  of  nodes.  The  profile  for  the  uniform/cellular  networks 
is  noticeably  opposite  of  the  core  periphery  network  although  the  slope  of  the  accuracy  line  is 
somewhat  consistent  for  all  four  error  type/forms.  Notice,  again,  that  uniform  and  cellular  lines 
he  almost  exactly  atop  one  another  in  this  set  of  graphs. 

100  50%: (edge  add  MAXIS)  degCent  DATA ( top | topi | topi Opct | ry  100  50%: (edge  add  MAXIS)  degCent  DATA ( top | topi | topi Opct 


(a)  Uniform  Topology  (b)  Core  Periphery 


Fig.  4.  Plots  of  degree  centrality  showing  average  accuracy  for  edge-add  errors. 

Our  next  observation,  the  fourth,  is  that  core  periphery  networks  show  extremely  high  levels  of 
accuracy  for  edge-add  errors.  The  accuracy  is  to  the  extreme  that  the  errors  appear  practically 
inconsequential.  Figure  4(a)  shows  the  accuracy  for  the  uniform  and  cellular  networks 
deteriorating  monotonically  as  error  level  increases.  To  the  contrary,  as  Fig.  4(b)  shows,  the 
average  accuracy  levels,  as  error  level  increases,  are  consistently  well  above  0.95  for  each  of  the 
congruence  measures.  In  Fig.  4,  each  line  is  a  different  congruence  measure:  top,  top3,  toplOpct, 
overlap  (labeled  soc  cir)  and  R-Squared  (labeled  Icorrel),  pertaining  to  the  degree  centrality 
measure. 

Further  to  the  fourth  observation,  one  particular  case  is  relevant  to  point  out.  While  the 
accuracy  levels  across  the  four  centrality  measures  deviate  from  one  another,  Fig.  5  shows  the 
accuracy  of  the  eigenvector  measure  for  core  periphery  networks  as  being  uncharacteristically 
different  from  the  other  centrality  measures.  This  is  in  stark  contrast  to  consistency  of  the 
measures  in  other  scenarios. 

The  fifth  observation  is  that  the  core  periphery  network  with  node-add  or  edge-remove 
errors,  is  highly  sensitive  to  small  errors  relative  to  uniform/cellular  networks  sensitivity  at  the 
same  error  levels.  As  shown  in  Fig.  6,  the  accuracy  of  top  and  top3  congruency  measures  drops 
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sharply  with  errors  of  1-5%.  At  the  10%  error  level,  the  accuracy  has  already  approached  an 
apparent  asymptote  level,  while  other  topologies  have  more  of  a  linear  trajectory  under  the  same 
conditions.  Core  periphery  (plot  b)  shows  significantly  less  robustness  than  the  uniform  topology 
at  particularly  small  levels  of  error.  Each  line  is  a  different  centrality  value:  top,  top3,  toplOpct, 
overlap  (labeled  soc  cir)  and  R-Squared  (labeled  Icorrel). 

phery  100  50%: (edge  add  MAXIS)  DATA(degCent | bet Cent | clsCent 


Fig.  5.  Average  accuracy  values  for  core  periphery  (100/50%)  network  with  edge-add  errors 
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(a)  Unifonn  Topology 


(b)  Core  Periphery 


Fig.  6.  Accuracy  profile  for  degree  centrality  by  congruence  measures  for  uniform  and  core 
periphery 

5.  Discussion 

The  observations  arising  from  our  experiments  provide  a  window  and  give  insight  into 
identifying  a  connection  between  a  network  topology  and  the  robustness  of  centrality  measures. 
Our  observations  are  both  consistent  with  prior  published  research  and  further,  they  present  early 
evidence  of  a  relevant  relationship  between  topology  and  the  measures’  robustness.  In  short,  we 
believe  that  there  may  indeed  be  an  important  relationship — albeit  as  yet,  an  ambiguous  one — 
between  network  topology  and  the  robustness  profile  of  common  centrality  measures.  Through 
our  experiments,  it  appears  that  while  unifonn  and  the  cellular  topologies  have  nearly  identical 
robustness  profiles,  we  found  that  the  core  periphery  topology  has  a  very  different  robustness 
profile  that  is  clearly  distinct  from  that  of  the  other  two  topologies  considered. 

In  the  remainder  of  this  section,  we  discuss  the  new-found  evidence  to  substantiate  our  claim. 
As  we  will  provide  only  our  speculation  in  an  attempt  to  explain  our  observations,  each 
observation  warrants  its  own  targeted  and  separate  study  to  confinn  and  to  fully  understand 
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reasons  for  and  the  dynamics  of  the  particular  observed  phenomenon.  Our  observations — first 
introduced  in  the  Results  Section — pertaining  to  the  robustness  of  centrality  measures  as  are 
listed  here: 

1 .  Robustness  among  the  four  centrality  measures  is  similar  (with  a  partial  exception) 

2.  Robustness  of  uniform  and  cellular  is  nearly  identical 

3.  Robustness  of  core  periphery  differs  greatly  from  uniform/cellular: 

■  Higher  robustness  for  core  periphery  with  node-remove  or  edge-add  errors 

■  Lower  robustness  for  core  periphery  with  node-add  or  edge-remove  errors 

4.  Extreme  robustness  (high)  is  found  in  core  periphery  with  edge-add  error 

5.  Extreme  robustness  (low)  is  found  in  topi  and  top3  congruence  measures  of  core 
periphery  with  node-add  or  edge-remove  error 

Our  opening  observation — that  the  accuracy  for  each  of  the  four  centrality  measures  is 
similar  over  different  levels  of  error — is  consistent  with  a  finding  of  the  Borgatti,  Carley  and 
Krackhardt  (in  press)  study.  Our  observations  augment  and  strengthen  their  finding  by  showing 
the  same  phenomenon  they  discovered,  albeit  across  the  differing  topologies.  In  general,  this 
phenomenon  seems  to  hold  regardless  of  the  particular  topology  being  studied. 

However,  we  observed  a  notable  exception  to  this;  a  difference  exists  in  the  case  of 
eigenvector  centrality  for  a  core  periphery  network.  In  this  particular  case,  unlike  the  other 
centrality  measures,  the  eigenvector  measure  found  to  be  extremely  sensitive.  For  very  small 
error  levels  it  was  much  less  robust  than  the  other  measures  given  the  same  true  network  and 
error  parameters,  which  is  in  conflict  with  other  circumstances. 

By  its  design,  a  characteristic  of  the  eigenvector  measure  is  that  it  minimizes  the  influence  of 
the  near-isolates  of  a  node;  the  measure  gives  more  weight  to  the  global  centrality.  By  adding 
equally  distributed  random  edges  to  the  network,  statistically  the  global  centrality  of  nodes  may 
be  more  quickly  impacted  than  the  local  because  there  are  likely  more  distant  nodes  than  local  to 
any  given  node.  In  the  case  of  core  periphery,  with  its  clique-like  core  and  sparely  tied  periphery, 
it  follows  that  other  centrality  measures  (degree,  betweenness  and  closeness)  would  be  impacted 
less  in  this  scenario  because  of  their  mathematical  characteristics. 

We  surmise,  therefore,  that  the  topology  of  a  network  affects  the  comparability  (the 
consistency  or  inconsistency)  of  robustness  profiles  across  different  centrality  measures. 

The  consistent  and  near-exact  similarity  of  robustness  profiles  for  the  uniform  and  the 
cellular  topologies  was  a  surprise  to  discover  and  at  first  is  rather  difficult  to  explain;  deeper 
investigation  into  this  finding  is  certainly  warranted  to  aid  in  adequately  explaining  it.  We 
suspect  this  phenomenon  may  arise  from  our  method  of  generating  the  cellular  network  which 
may  be  producing  a  network  very  similar  (patterns  of  the  edges  between  the  nodes)  to  the 
random  networks,  formed  using  different  generation  algorithms. 

If  true,  this  suggests  that  networks  generated  using  all-together  different  algorithms,  under 
certain  parameters,  may  in  fact  results  in  identical  networks,  and  thus  are  indistinguishable  from 
their  source  algorithm,  leading  to  similar  robustness  profiles.  Further,  the  highly  differentiated 
robustness  profiles  for  the  core  periphery  network  provides  clear  evidence  that  topology  can 
impact  the  robustness  of  centrality  measures  and  its  generating  algorithm  is  different  from  the 
uniform  and  cellular  algorithms. 

The  extreme  levels  (high  and  low)  of  robustness  in  particular  cases  and  measures  for  the  core 
periphery  network  provide  evidence  that  the  topology  combined  with  the  type  of  error  has  a 
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significant  affect  on  the  robustness  profile.  The  profiles  for  the  centrality  measures  in  the  case  of 
uniform  and  cellular  networks  has  a  smooth  shape,  while  the  core  periphery,  depending  on  the 
particular  measure,  has  a  smooth  and  an  abrupt  shape. 

We  surmise  that  a  network’s  topology  itself  is  not  a  sole  and  decisive  factor  in  the 
determination  of  robustness  of  centrality  measures;  the  similarity  or  dissimilarity  between  the 
generative  characteristics  of  the  network  topology  and  the  manner  in  which  a  measure  is 
formulated  influences  the  comparability  of  robustness  profiles  across  different  centrality 
measures. 


6.  Future  Research 

As  provided  earlier,  the  motivation  for  this  study  is  the  recognition  of  a  major  problem 
inherent  with  social  network  data,  that  is,  the  omnipresent  measurement-error  that  is  embedded 
in  the  data  we  so  carefully  collect,  and  subsequently  analyze;  in  short,  the  data  we  depend  upon 
is  known  to  be  erroneous.  We  have  been  warned  of  the  possible  implications  and  understand 
some  causes  (Killworth  &  Bernard,  1976),  but  only  now  we  are  beginning  to  systematically 
identify  and  quantify  the  implications.  There  remains  an  abundant  need  for  additional  research 
into  the  robustness  of  network  measures  since  the  problem  of  mistaken  data  may  be  a  factor  for  a 
long  time.  In  this  section,  we  put  forward  three  suggestions  for  future  research. 

First,  while  we  have  shown  that  topology  has  an  affect  on  the  robustness  of  centrality 
measures,  there  is  the  next  question  about  the  precise  extend  to  which  each  of  the  many  different 
topologies  and  their  variants  distinctively  affect  the  robustness  profiles.  In  the  guise  of  network 
topology  labels,  subtle  differences  in  the  methodology  for  generating  a  given  network  may 
possibly  result  in  diverse  robustness  levels.  Perhaps  it  is  a  characteristic  of  a  topology  (thus  a 
family  of  topologies)  that  matter,  not  a  specific  topology  itself.  For  example,  there  are  many 
ways  to  generate  a  core  periphery  network;  each  variant  needs  to  be  explored  and  individually 
related  to  a  specific  robustness  profile. 

Second,  as  we  and  others  have  openly  acknowledged,  errors  in  observed  social  network  data 
most  likely  are  not  truly  random  in  nature.  Early  research,  such  as  this,  specifically  toward 
investigating  robustness  have  been  limited  to  research  based  on  random  error  as  opposed  to  more 
realistic,  systemic  or  non-randomly  influenced  errors  in  the  data.  One  notable  exception  to  this 
is  the  Marsden  (1990)  study  which  examined  both  random  and  non-random  errors.  Certainly, 
this  makes  the  research  much  more  complicated,  but  the  community  will  be  rewarded  with 
theories  based  on  richer  scenarios. 

Third,  it  should  prove  invaluable  to  analysts  when  they  have  statistically  valid  confidence 
levels  and  error  bounds  applicable  to  their  specific  observed  network.  Such  quantities  may 
possibly  be  based  on  the  known  parameters  and  characteristics  of  the  observed  network 
combined  with  the  a  priori  true  network  information  and  error  characteristics.  To  date,  analysts 
are  constrained  by  using  measures  detennined  only  from  the  observed  network,  thus  are  being 
limited  to  working  with  descriptive  statistic  only.  The  analysis  of  networks  will  take  a  huge  leap 
forward  when  confidence  levels  can  be  assigned  to  collected  data  that  will  ultimately  lead  to 
including  p-vales  with  the  statistics  we  calculate  from  observed  data. 

7.  Conclusion 

Our  experiments  and  analysis  of  the  data  leads  us  to  the  conclusion  that  a  network’s  topology 
is,  in  fact,  related  to  the  robustness  of  the  centrality  measures.  We  have  shown  that,  in  at  least 
one  specific  case  (core  periphery  versus  uniform  and  cellular),  different  topologies  can  have 
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distinctive,  or  conversely  in  another  case  (uniform  versus  cellular)  nearly  identical,  profiles  of 
robustness  for  common  measures  of  centrality.  We  have  reported  evidence  that  there  is  a 
profound  difference  in  centrality  measures’  accuracy  for  core  periphery  networks  vis-a-vis 
accuracy  for  uniform  and  cellular  networks,  leading  to  the  conclusion  that  when  considering  the 
robustness  of  centrality  measures  of  a  network,  topology  matters.  Since  our  findings  are  entirely 
new  to  the  research  community,  we  call  for  more  research  into  this  phenomenon.  Understanding 
the  impact  of  mistaken  observed  data  in  social  network  analysis  is  critical  to  accurately 
projecting  results  of  quantitative  analysis  to  the  qualitative  assessment  of  a  social  network. 
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8.  Appendix 

The  four  tables  herein  provide  complete  results  for  an  entire  experiment  involving  a  true 
network  of  100  nodes  with  a  density  of  50%.  Similarly-detailed  tables  for  networks  of  the  other 
size  (10,  25,  and  50  nodes)  and  densities  (1,  2,  5,  10,  30,  50,  70,  and  90%)  are  available  from  the 
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Average  Accuracy  Values  for  100  Node,  50%  Density  Network —  Node  Add 
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